Speech recognition performance of CJLC: corpus of Japanese lecture contents
نویسندگان
چکیده
This paper discusses the speech recognition of Japanese classroom lecture speech. In particular, we mention the influences of microphone differences and the language model differences on the speech recognition performance of classroom lectures. First, we collected actual classroom lecture contents from several universities in Japan. In this paper, we recorded the lecture speech using lapel microphones because lapel microphones are more commonly used to record lectures. LVCSR is one of the essential technologies for adding tag information to such lecture speech. Next, therefore, we researched the influence of the differences between microphones used for recording lecture on speech recognition performance. Finally, seven types of language models that were trained using three types of corpora were compared on the basis of their ability to lecture speech.
منابع مشابه
Developing Corpus of Japanese Classroom Lecture Speech Contents
This paper explains our developing Corpus of Japanese classroom Lecture speech Contents (henceforth, denoted as CJLC). Increasing e-Learning contents demand a sophisticated interactive browsing system for themselves, however, existing tools do not satisfy such a requirement. Many researches including large vocabulary continuous speech recognition and extraction of important sentences against le...
متن کاملExtension of the LECTRA corpus: classroom LECture TRAnscriptions in European Portuguese
This paper presents the recent extension of the LECTRA corpus, a speech corpus of university lectures in European Portuguese that will be partially available. Eleven additional hours of various lectures were transcribed, following the previous multilayer annotations, and now comprising about 32 hours. This material can be used not only for the production of multimedia lecture contents for e-lea...
متن کاملA Structure-based Method for Speech Summarization
This paper proposes a model and system for speech summarization, aimed at selectively listening to speci c contents in the entire lecture speech data. Our system uses both a target speech and its corresponding paper. Papers are used to identify contents where users are interested, based on structure/surface information. On the other hand, speech is e ective to deeply understand speci c contents...
متن کاملEfficient Access to Lecture Audio Archives through Spoken Language Processing
The paper firstly addresses the current state of speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking rate is also effec...
متن کاملAutomatic Speech Transcription and Archiving System using the Corpus of Spontaneous Japanese
The target of automatic speech recognition (ASR) research has been shifted from read speech to spontaneous speech. The technology will realize automatic transcription (and translation) of lectures and meetings. In Japan, ”Spontaneous Speech” project has been conducted in last five years, and we set up the huge ”Corpus of Spontaneous Japanese (CSJ)”, which consists of over 2000 speeches (500 hou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008